oracle-based adversarial contextual bandit
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Reviews: Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits
The paper makes some interesting contribution by proposal a new partial information relaxation to improve the regret bound. The analysis of the partial info relaxation bears some similarity to that in Rahklin and Sridharan, but the inclusion of the Radamacher term is new. By using Lemma 2, the bound is improved. While this is definitely a new result and there are new ideas, the similarity to current literature kind of compels me to give a '3' instead of a '4' for the Novelty/Originality and Technical Constribution scores. Presentation: The paper is concise and well written, but I think that the proof of admissability between 5 and 7 is too "straight-lined".
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.49)
Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits
Syrgkanis, Vasilis, Luo, Haipeng, Krishnamurthy, Akshay, Schapire, Robert E.
We propose a new oracle-based algorithm, BISTRO, for the adversarial contextual bandit problem, where either contexts are drawn i.i.d. or the sequence of contexts is known a priori, but where the losses are picked adversarially. Our algorithm is computationally efficient, assuming access to an offline optimization oracle, and enjoys a regret of order $O((KT) {\frac{2}{3}}(\log N) {\frac{1}{3}})$, where $K$ is the number of actions, $T$ is the number of iterations, and $N$ is the number of baseline policies. Our result is the first to break the $O(T {\frac{3}{4}})$ barrier achieved by recent algorithms, which was left as a major open problem. Our analysis employs the recent relaxation framework of (Rakhlin and Sridharan, ICML'16). Papers published at the Neural Information Processing Systems Conference.
Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits
Syrgkanis, Vasilis, Luo, Haipeng, Krishnamurthy, Akshay, Schapire, Robert E.
We propose a new oracle-based algorithm, BISTRO+, for the adversarial contextual bandit problem, where either contexts are drawn i.i.d. or the sequence of contexts is known a priori, but where the losses are picked adversarially. Our algorithm is computationally efficient, assuming access to an offline optimization oracle, and enjoys a regret of order $O((KT)^{\frac{2}{3}}(\log N)^{\frac{1}{3}})$, where $K$ is the number of actions, $T$ is the number of iterations, and $N$ is the number of baseline policies. Our result is the first to break the $O(T^{\frac{3}{4}})$ barrier achieved by recent algorithms, which was left as a major open problem. Our analysis employs the recent relaxation framework of (Rakhlin and Sridharan, ICML'16).
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.49)